Computation Model: Synchronous Data Flow

AWS Trainium

f(x) = High-throughput transformer and large-scale neural network training across sparse and dense workloads on EC2 Trn1 clusters.

AWS custom training chip powering EC2 Trn1 instances with high throughput, supporting InfiniBand fabric for massive multi-node synchronization while running dense and sparse machine learning workloads...

deterministic irreversible exact

Google TPU v1

f(x) = AI accelerator

Google's first TPU, announced in 2016, ties a large 256×256 systolic array built for dense matrix multiplies to local weight memory so inference workloads across Google data centers run deterministica...

deterministic irreversible exact

Google TPU v2

f(x) = AI training and inference acceleration

Google's second-generation TPU v2 is a datacenter-scale AI accelerator built around large systolic arrays, high-bandwidth memory, and bfloat16 matrix units, forming Cloud TPU v2 pods to deliver high-t...

deterministic irreversible exact

Google TPU v3

f(x) = AI training and inference acceleration

Third-generation Google TPU pairs float32/16 matrix multiply arrays with HBM2 and Cloud TPU v3 pods contain 8x more TPU chips than the previous generation, delivering massive training and inference ac...

deterministic irreversible exact

Google TPU v4

f(x) = dense matrix multiply and transformer attention pipelines

Google TPU v4 is the latest pod-scale accelerator from Google that deterministically realizes dense linear algebra and transformer attention via custom systolic arrays. Each TPU v4 die pairs stacked H...

deterministic irreversible exact

Google TPU v5

f(x) = High-throughput tensor acceleration for deep learning training and inference

Google's fifth-generation TPU (v5) is a datacenter AI accelerator optimized for massive matrix multiplies; each chip exposes more matrix units than v4, and when assembled into TPU v5 pods it delivers ...

deterministic irreversible exact

Groq Tensor Streaming Processor

f(x) = deterministic single-cycle tensor streaming execution for deep learning inference

Groq Tensor Streaming Processor delivers deterministic single-cycle tensor execution within Groq hardware so ML inference workloads observe predictable latency in massive pipelined flows.

deterministic irreversible exact

Intel Xe-HPC

f(x) = Dense HPC GPU acceleration for AI training, scientific simulation, and matrix algebra

Ponte Vecchio GPUs combine HBM2e stacks, AVX-512 adapted cores, and a tile-based Intel 7/4 process optimized for HPC tiles, locking thousands of wide SIMT lanes per tile and coordinating them through ...

deterministic irreversible exact

Lightmatter Passage

f(x) = photonic inference

Lightmatter Passage optical AI accelerator uses photonic inference and light-based matrix multiplies to drive an optical dataflow across a waveguide matrix engine.

deterministic irreversible exact

Luminous Computing

f(x) = photonic logic for AI

Luminous Computing centers on photonic logic for AI, building coherent-light neural accelerators orchestrated via optical dataflow.

deterministic irreversible exact

MIT Tagged-Token Dataflow Architecture

f(x) = Id dataflow semantics with tag-based dynamic scheduling

MIT Tagged-Token Dataflow Architecture pairs high-performance scheduling with tagged token contexts that encode activation frames, letting distributed execution units match tokens, dispatch operands, ...

deterministic irreversible exact

Manchester Dataflow Machine

f(x) = fine-grained token-driven computation

The Manchester Dataflow Machine concept of the 1970s emphasized token-based dataflow execution with tokens flowing through FIFO routers and firing operations out of order as soon as operands arrived, ...

deterministic irreversible exact

Normal Computing stochastic processing units

f(x) = Unconventional analog thermodynamic inference

Normal Computing's stochastic processing units leverage probabilistic analog circuits with thermodynamic noise shaping and memristive elements to accelerate AI inference workloads while embracing phys...

deterministic irreversible exact

SambaNova RDU (Reconfigurable Dataflow Unit)

f(x) = AI training and inference dataflow graphs

Reconfigurable Dataflow Units implement granular dataflow graphs by combining configurable tiles with per-tile scheduling and streaming data paths. Each tile bundles compute arrays, SRAM buffers, and ...

deterministic irreversible exact

Tenstorrent Grayskull

f(x) = AI training acceleration

Tenstorrent Grayskull is a tile-based architecture of compute tiles with systolic arrays paired with on-tile high-bandwidth memory to deliver massive data-parallel tensor math and training throughput ...

deterministic irreversible exact

Tenstorrent Wormhole

f(x) = AI accelerator

Tenstorrent Wormhole is a multi-chip module designed for large language models, providing a high-bandwidth interconnect and integration with the Tenstorrent software stack.

deterministic irreversible exact

UPMEM PIM

f(x) = parallel search and graph analytics near memory

UPMEM Processing-In-Memory DIMMs combine DRAM banks with embedded RISC DPUs, enabling data-center scale parallel search and graph analytics without moving data back to host CPUs.

deterministic irreversible exact

Systems (17)